Scaling in Tournaments 
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We study a stochastic process that mimics single-game elimination tournaments. In our model, the 
outcome of each match is stochastic: the weaker player wins with upset probability q < 1/2, and the 
stronger player wins with probability 1 — q. The loser is eliminated. Extremal statistics of the initial 
distribution of player strengths governs the tournament outcome. For a uniform initial distribution 
of strengths, the rank of the winner, a;*, decays algebraically with the number of players, N, as 
x* ~ N~ 13 . Different decay exponents are found analytically for sequential dynamics, /3 scq = 1 — 2q, 



and parallel dynamics, /3 pa 
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The distribution of player strengths becomes self-similar 



in the long time limit with an algebraic tail. Our theory successfully describes statistics of the US 
college basketball national championship tournament. 
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A wide variety of processes in nature and society in- 
volve competition. In animal societies, competition is re- 
sponsible for social differentiation and the emergence of 
social strata. Competition is also ubiquitous in human 
society: auctions, election of public officials, city plans, 
grant awards, and sports involve competition. Mini- 
malist, physics-based competition processes have been 
recently developed to model relevant competitive phe- 
nomena such as wealth distributions [H, 0, [H, auctions 
0, IE 0! social dynamics @, H, H, 03, games [ll|, and 
sports leagues In physics, competition also under- 

lies phase ordering kinetics, in which large domains grow 
at the expense of small domains that eventually are elim- 
inated [ll, Cl- 
in this study, we investigate A-player tournaments 
with head-to-head matches. The winner of each match 
remains in the tournament while the loser is eliminated. 
At the end of a tournament, a single undefeated player, 
the tournament winner, remains. Each player is endowed 
with a fixed intrinsic strength x > that is drawn from 
a normalized distribution fo(%)- We define strength so 
that smaller x corresponds to a stronger player and we 
henceforth refer to this strength measure as "rank" . 

The result of competition is stochastic: in each match 
the weaker player wins with the upset probability q < 1/2 
and the stronger player wins with probability p = 1 — q. 
Schematically, when two players with ranks x% and X2 
compete, assuming X\ < x%, the outcome is: 



(xi,x 2 




with probability 1 
with probability q. 



(1) 



For q — 0, the best player is always victorious, while for 
q = 1/2, game outcomes are completely random. We are 
interested in the evolution of the rank distribution, as 
well as the rank of the tournament winner. 

We find that the rank of the winner, x*, decays alge- 
braically with the number of players N as 



with the exponent f3 = f3(q) a function of the upset 
probability. When the ranks of the tournament play- 
ers are uniformly distributed, we find different values 
for sequential and parallel dynamics: /3 soq = 1 — 2q and 

/3 P ar = 1 + ^4nT^- Moreover, the rank distribution be- 
comes asymptotically self-similar and has a power-law 
tail. We also extend these results to arbitrary initial 
distributions. The extreme of this distribution governs 
statistical properties of the rank of the ultimate winner. 
Sequential Dynamics. We formulate the competition 
process by assuming that each pair of players compete 
at a constant rate. In this formulation, games are held 
sequentially, and players are eliminated from the tourna- 
ment one at a time. The fraction of players remaining in 
the competition at time t, c(t), decays according to 



dc 
dt 



-c 2 . 



(3) 



Solving this equation subject to the initial condition 
c(0) = 1, the surviving fraction is 



c(t) = (!+<)" 



(4) 



The tournament ends with a single player and this occurs 
at time t*, that can be estimated from c(f») ~ A^ 1 . 
Therefore the time to complete the competition scales 
linearly with the number of players t* ~ N. 

Let f(x, t) dx be the fraction of remaining players with 
rank in the range (x, x+dx) at time t. The density f{x,t) 
obeys the nonlinear integro-differential equation 



df(x) 
dt 



-2pf(x) / dyf(y)-2qf(x) 
Jo 



dyf(y). (5) 



N -fi 



(2) 



The first term accounts for games where the favorite 
wins and the second term for games where the under- 
dog wins. The initial condition is f(x,0) — fo(x) with 
/ dxfo(x) = 1. Integrating ([5]), the total fraction of re- 
maining players, c(t) = J dxf(x,t), indeed decays ac- 
cording to ([3]). We note that this master equation is 
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exact in the limit of an infinite number of players and 
applicable only as long as the fraction of remaining play- 
ers is finite. 

The rank distribution can be obtained by introducing 
the cumulative distribution F(x), defined as the fraction 
of players with rank smaller than x, 



F(x) = / dyf(y). 



(6) 



The distribution of player ranks is obtained from 
the cumulative distribution by differentiation, 
f(x) = dF(x)/dx. By integrating the master equa- 
tion ((U, the cumulative distribution obeys the closed 
nonlinear equation 



dF 

— = (2q - l)F 2 - 2qcF. 



(7) 



The initial condition is F(x, 0) = F (x) = J* dyf (y). 
Substituting H(x) — 1/F(x), we transform to the lin- 
ear equation 



— = (1 - 2q) + 2qcH. 



(8) 



Integrating this equation with respect to time, we find 
H(x) = [H (x) - 1](1 + t) 2q + (1 + t). Substituting the 
initial condition Hq(x) — 1/Fq(x), we obtain the cumu- 
lative rank distribution 



F(x,t) = 



W 

[l-F o {x)]0.+t)* + F o (x)(l + ty 



(9) 



From this, the actual density of player rank is obtained 
by differentiation 



fo(x)(l+t) 



2<l 



{(i~F (x))(i + ty« + F (x)(i + t)y 



(10) 



Notice that when the game outcome is random, q = 1/2, 
the normalized distribution of rank does not evolve with 
time as f(x,t)/c(t) = fo(x). 

Uniform Initial Distribution. Consider first the spe- 
cial case of a uniform initial distribution, fo(x) = 1 for 
< x < 1, and deterministic games, q = 0. Then the 
initial cumulative distribution is Fq(x) = x for x < 1 and 
Fq(x) = 1 for x > 1. The time-dependent cumulative 
distribution @ is 



F(x,t) = 



l + xt 



(11) 



for x < 1 and F(x, t) = c(t) for x > 1. Similarly, the rank 
distribution itself is f(x,t) = (1 + xt)~ 2 for < x < 1. 
As expected, weaker players are more likely to be elim- 
inated as the tournament proceeds and the remaining 
field becomes stronger. Quantitatively, the average rank 
of surviving players, (x) = J dxxf(x)/ J dxf(x), is 



(x) =t~ 2 [(l + t) ln(l + t)-t]. 



(12) 



Therefore, the average rank asymptotically decays with 
time, (x) ~ t^ 1 hit. 

We can write the cumulative distribution in the scal- 
ing form F(x, t) — > t _1 $(a:f), by multiplying and dividing 
([TT]) by time. Here, the scaling function is &(z) = y^-j, 
which approaches unity $(z) — > 1 when z — > oo, consis- 
tent with total density decay c ~ t . In the long time 
limit, the cumulative distribution retains the same shape 
as the initial distribution, $(z) ~ z, for z <C 1. The scal- 
ing variable z = xt indicates that players with rank larger 
than the characteristic rank x ~ t~ l are eliminated from 
the tournament. 

Let us generalize these results to arbitrary q. In this 
case, the cumulative distribution is 



F(x,t) 



(i - x){\ + tfi + x{\ + ty 



(13) 



for x < 1 and F(x, t) = c(t) otherwise. In the long 
time limit, we may replace 1 + 1 with t, and also replace 
1 — x with 1, since the rank decays with time. Then the 
cumulative distribution approaches the scaling form 



F(x,t) -^t- 1 ^(xt 1 - 2q ) 
The scaling function remains as above 



1 + z 



(14) 



(15) 



The scaling form (fT4| implies that the typical rank decays 
algebraically with time 



t -(l-2g) < 



(16) 



Interestingly, the exponent governing this decay depends 
on the upset probability. The larger the upset probabil- 
ity, the smaller the decay exponent. Thus weaker players 
can persist in a tournament when q approaches 1/2. For 
completely random games, q = 1/2, the exponent van- 
ishes and the strength of the typical surviving player does 
not change with time. 

A similar scaling law characterizes the rank of the tour- 
nament winner. From (j4]), the number of players remain- 
ing in the tournament, M, and the initial number of play- 
ers N, are related by t - N/M. Using flB]). when M 
players remain, the typical rank is x ~ (N ' /M)^^ 2q \ 
Substituting M — 1, we find that the typical rank of 
the winner decays algebraically with the total number of 
players, as in l[2"]). with the exponent 



/3 seq = l- 2q. 



(17) 



Therefore, the smaller the tournament or the higher the 
upset probability the weaker the winner, on average. We 
note that due to strong fluctuations, the master equation 
([5]) is not applicable when the number of players is of or- 
der one, and consequently, our theoretical framework can 
not be used to obtain the distribution of the tournament 
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General Initial Distributions. Our findings in the 
case of uniform distributions suggest that the behavior 
of the initial distribution in the x — > limit governs the 
long time asymptotics. Let us consider rank distributions 
with a power-law behavior near the origin, 



F (x)~Cx» +1 , 



(18) 



as i with fi > —1 so that the distribu- 

tion is normalized. The rank density then scales as 
fo(x) ~ C(/i + l)^. Since the rank x decays with time, 
the term (1 — i<b) (1 + i) 29 in the denominator of © can 
be replaced by t 2q and similarly, the term Fq(x)(1 + t) 
can be replaced by Cx^ +1 t. Therefore, the scaling form 



14| becomes F(x.t) — » t 1 $(:ri<"+ 



J , with the scaling 

function $(z) = Cz^ +1 /(1 + Cz^ +1 ). Thus the typical 

player rank decays with time according to x ~ t "+ 1 . 
Similarly, the rank of the winner decays with the number 
of players as in ((2]) with scq = ^pjj- ■ 

Like the cumulative distribution, the density of players 
with given rank also becomes self-similar asymptotically, 
f(x,t) -> t^^xt ) with = 101 and <j>(z) = 
As noted earlier, the shape of the distribution is pre- 
served: f(z) ~ z M as z —>■ 0. The large argument behav- 
ior is 



(19) 



as z — > 00. The algebraic decay shows that the likelihood 
of finding weak players in the tournament is appreciable. 
Surprisingly, when initially most players are strong they 
can eliminate each other, leading to an appreciable prob- 
ability for weak players to survive. 

The scaling behavior ((2]) refers to the typical rank of 
the winner. The algebraic tail (fH)|) suggests that the 
average rank may scale differently than the typical rank. 
For example, for compact uniform distributions (fi = 0), 
the average is characterized by a logarithmic correction 
as in CE2D, (x*) ~ iV^ 1 - 2 ?) In AT. 

Parallel Dynamics. Thus far, we addressed sequential 
games with a single team eliminated at a time. However, 
actual sports tournaments typically proceed via rounds 
of parallel play with half of the teams eliminated in each 
round. We thus consider such round-play tournaments 
with N = 2 k players. Let gN(x) be the normalized dis- 
tribution of the rank of the winner with J dx gw(x) = 1 
and let Gn(x) — dy gjy(y) be the corresponding cu- 
mulative distribution. 

Consider first a tournament with N = 2 players. Sim- 
ilar to Eq. ((51) , the rank distribution of the winner is 

g 2 (x) = 2p 9l (x)[l - + 2qg 1 (x)G 1 (x). (20) 

Integrating this equation, we arrive at an explicit ex- 
pression for the distribution of the rank of the winner 
G 2 (x) = 2pGx{x) + (1 - 2p)[G 1 (x)} 2 . Clearly, this non- 
linear recursion relation applies to every round of the 
tournament and therefore, 



P 



0.8- 
0.6- 
0.4 
0.2 
0, 



— Parallel 
-- Sequential 



0.1 0.2 0.3 0.4 0.5 



FIG. 1: The decay exponent j3 versus the upset probability 
q. Shown are the values for the sequential case (|17fl and the 
parallel case (|24p . 



Iterating this equation starting with G\{x), we obtain 
explicit expressions for the distribution of the winner for 
N = 2, 4, 8, . . . Explicit expressions can be obtained for 
the extreme cases of deterministic competitions (q = 0) 
where 1 — Gn(x) = [1 — G±(x)] N and random competi- 
tions (g = l/2) where Gn(x) = G\(x). 

Let us restrict our attention to uniform initial distribu- 
tions, G\{x) = x for x < 1. For small- 2;, we may neglect 
the nonlinear term in (|2"Tj) and then, Gi{x) ~ (2p)x, 
Gi(x) ~ (2p) 2 x, and in general 



G 2k (x) ~{2p) k x. 



(22) 



To obtain the asymptotic behavior, we substituting 
fc = ig r f into and then G N {x) ~ x with 

= 1 + Therefore, the cumulative distribution of 
the rank of the winner follows the scaling form 



G N (x) -» * (xN 13 ) 



(23) 



when N — > 00. The scaling function is linear, ^S(z) ~ z, 
in the limit z — > 0, reflecting that the extremal statistics 
are invariant under the competition dynamics. 

The scaling form (f23|) shows that the rank of the tour- 
nament winner decays algebraically with the tournament 
size as in @. Surprisingly, the decay exponent 



pa 
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(24) 



G 2N (x) = 2 P G N (x) + (1 - 2p)[G N {x)f 



(21) 



for parallel dynamics, differs from the decay exponent 
(I17p for sequential dynamics. The two exponents coin- 
cide in the extreme cases, 0(0) = 1 and 0(1/2) = 0. 
The inequality /3 par > scq (figure [T]) shows that paral- 
lel play benefits the strong players. Indeed, in sequen- 
tial play weak players may survive by being idle. The 
source of this discrepancy is fluctuations in the number 
of games. In sequential dynamics, the number of games 
played by each player is variable while in parallel dynam- 
ics the number of games is fixed. 
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FIG. 2: The cumulative distribution of the rank of the group 
winner Gie(x). The empirical distribution for college bas- 
ketball (circle) is compared with Monte Carlo simulations 
(squares), and the parallel dynamics theory (diamonds). 



Substituting the scaling form (|23|) into the recursion 
(|2ip. the scaling function obeys the nonlinear-nonlocal 
equation 



<F(2pz) = 2p^(z) + (1 - 2p)^ 2 (z) 



(25) 



The boundary condition are ^(0) = and \I>(oo) = 1. 
An exact solution is feasible only when there are no up- 
sets: &(z) = 1 — e _z for q = 0. Otherwise, we perform 
an asymptotic analysis. As shown above, the small- z be- 
havior is generic, ^(z) ~ z. At large arguments, we write 
U(z) = 1 — ^>(z) and since U <C 1, we can neglect the 
nonlinear terms and then U(2pz) — 2qU(z). This implies 
the algebraic decay U(z) ~ z (in2g)/(in2 P )_ As a resmt) t h e 
likelihood of finding weak winners, <?_/v(cc 
with ip(z) — $f'(z), decays algebraically 



In 2q 



(26) 



as z — > oo. This algebraic behavior is very different 
from the exponential decay ip(z) = eT z for determin- 
istic games. In contrast to sequential play, the exponent 
depends on the upset probability. This large likelihood of 
finding weak winners reflects that the number of games 
played by the tournament winner scales logarithmically 
with the number of teams. For example, as N = 2 k , the 
likelihood that the weakest player wins, q k = N lnq / ln2 , 
is appreciable as it decays only algebraically with N. 
Empirical Study. To test our theoretical approach, we 
studied the US men's NCAA college basketball national 
championship where 64 teams are divided into 4 groups 
of 16, with teams in each group ranked 1 (best) to 16 
(worst). The winner of each group advances to the "final 
four" . As in the parallel dynamics, half of the teams are 
eliminated in each round. The schedule, however, is not 



random: the games are arranged so that if there are no 
upsets the bottom half is eliminated in each round. We 
analyzed the results of all 1680 games since this format 
was established (1979-2006) [H. We calculated the cu- 
mulative rank distribution of the team advancing to the 
final four, G\q(x), with x = 1,2, . . . , 16 (figure^]). Addi- 
tionally, we measured the upset frequency q = 0.27 5 by 
counting the number of games won by the underdog [12j | . 

To compare with the theoretical model, we simulated 
the NCAA tournament schedule in which the lower- 
ranked team wins with upset probability q. The parame- 
ter q was treated as a tunable variable, and we present re- 
sults for the value that best matched the empirical data. 
The simulation results produce a rank distribution that 
agrees well with the empirical findings (figure [2]). The fit- 
ted upset probability q = 0.22 is close to the observed fre- 
quency. Alternatively, we modeled the data by iterating 
(I2ip starting with the uniform distribution G\(x) = x/16 
using a fitted upset probability of q = 0.175 (the theory 
assumes a random schedule and an approximate uniform 
distribution). We thus found that the competition model 
has predictive power that quantitatively captures empir- 
ical rank distributions, and enables estimates of upset 
frequencies from observed rank distributions. 

In summary, we studied dynamics of single-elimination 
tournaments, in which there is a finite probability for a 
lower-ranked player to upset a higher-ranked player. We 
obtained an exact solution for the distribution of player 
ranks for arbitrary initial conditions. Generally, the like- 
lihood of upset winners is relatively large since the tail of 
the distribution function decays algebraically with rank. 
The characteristic rank of the winning player decays al- 
gebraically with the number of players and the larger 
the upset probability, the slower this decay (small tour- 
naments are more likely to produce a surprise winner). 
Different decay exponents are found for sequential and 
parallel play with the latter generally larger (weak players 
fare better by avoiding competition). We demonstrated 
the utility of this model using college basketball results. 

Extreme properties of the initial distribution fully gov- 
erns the asymptotic behavior. In the long time limit, the 
player distribution becomes self-similar. Both the form of 
the scaling distribution and the time dependence of the 
characteristic rank depend only on the small- a; behav- 
ior of the initial distribution. A similar phenomenology 
where extremal statistics governs long-time asymptotics 
was found in studies of clustering in traffic flows [16| and 
species abundance in biological evolution [TJ [lj| • 
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